Synthesising short vowels from their long counterparts in a concatenative based text-to-speech system

نویسندگان

  • Ove Andersen
  • Niels-Jørn Dyhr
  • Inger S. Engberg
  • Claus Nielsen
چکیده

Danish has a distinctive vowel length opposition which is realized with little differences in vowel qualities. This paper investigates the possibilities of using this fact in reducing the size of the speech unit database in a high quality concatenative based text-to-speech system for Danish. The purpose is to evaluate the concept of using long vowels for synthesizing the corresponding short vowels. If this proves successful the size of the speech unit database may be reduced by approximately 40%. An acoustic analysis of the long and short vowels in the present speech unit database was performed. The results are presented in a F1-F2 plot, and demonstrate a significant overlap between long and short vowels. Consequently, two different strategies for synthesizing the short vowels from their long counterpart were tested. The first strategy used resegmented long vowel and the second relied entirely on the time-scaling technique built into the signal generation module. The two strategies for synthesizing the short vowels were compared to using pre-recorded short vowels in a comprehensive listening test. The results of the listening test were based on 32 subjects judging intelligibility and naturalness. The results show no significant differences between the prerecorded short vowels and the resegmented long vowels synthesized as short vowels. The resegmented long vowels will be implemented in the present text-to-speech system for further testing.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Acoustic Analysis of Persian EFL Learners' Pronunciation of English Vowels

This paper reports the results of an experimental study on non-native production of English vowels. Two groups of Persian EFL learners varying in language proficiency were tested on their ability to produce the nine plain vowels of American English. Vowel production accuracy was assessed by means of acoustic measurements. Ladefoged and Maddison’s (1996) F1 F2 measurements for American English v...

متن کامل

مراحل و نحوه ی تهیه ی دادگان های صوتی هجایی و دایفونی برای سامانه ی تبدیل متن به گفتار فارسی

Abstract Speech databases are part of the concatenative text to speech synthesis systems. Phonetic quality of the databases plays a significant role in the naturalness of the synthesized speech. This paper introduces two syllable and diphone speech databases for Persian and investigates the way of their development and their specifications and their advantages to each other. ...

متن کامل

Diphone-Based Concatenative Speech Synthesis System for Mongolian

This paper describes the first Text-to-Speech (TTS) system for the Mongolian language, using the general speech synthesis architecture of Festival. The TTS is based on diphone concatenative synthesis, applying TD-PSOLA technique. The conversion process from input text into acoustic waveform is performed in a number of steps consisting of functional components. Procedures and functions for the s...

متن کامل

The Short Vowels /i/ and /u/ in Iranian Balochi Dialects

The aim of the present paper is to study the status of the short vowels /i/ and /u/ in five selected Iranian Balochi dialects. These dialects are spoken in Sistan (SI), Saravan (SA), Khash (KH), Iranshahr (IR), and Chabahar (CH) regions located in province Sistan va Baluchestan in the southeast of Iran. This study investigates whether these two vowels have the same qualities as the short /i/ an...

متن کامل

Analysis of the degradation of French vowels induced by the TD-PSOLA algorithm, in text-to-speech context

In concatenative speech synthesis systems, synthetic speech is obtained by concatenating acoustic units selected from a database of natural speech. The duration and fundamental frequency (F0) of the selected units are usually different from those requested by a prosodic model, and so some prosodic modification must be applied to the units in order to obtain the desired target. TD-PSOLA is an ef...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998